Skip to content

bpo-33966, multiprocessing: Pool wait for worker start #7966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

bpo-33966, multiprocessing: Pool wait for worker start #7966

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jun 27, 2018

On Windows, if a worker is terminated too quickly, handles created by
DupHandle() on reduction.dump() remain open in the parent process,
causing a handles leak. Pool now waits using events until all new
workers started to make sure that the workers closed handles
inherited by DupHandle.

https://bugs.python.org/issue33966

On Windows, if a worker is terminated too quickly, handles created by
DupHandle() on reduction.dump() remain open in the parent process,
causing a handles leak. Pool now waits using events until all new
workers started to make sure that the workers closed handles
inherited by DupHandle.
@vstinner
Copy link
Member Author

I tried a different approach avoiding DUPLICATE_CLOSE_SOURCE in PR #7965, but it caused test_wait() to hang and it changed the semantics (handles are no longer closed in the parent as soon as they are "stolen" by the child process).

@vstinner
Copy link
Member Author

@pitrou: It seems like this bugfix is the last remaining issue to unblock my PR 7827 (bpo-18174: regrtest -R 3:3 checks for handle leak) to detect Windows handles leaks. Would you mind to review it?

Note: I chose to be conservative and set the event late, just before handling the first task in the worker. If we only want to fix https://bugs.python.org/issue18174 the event could be set earlier, just at the entry of the worker() function.

@pitrou
Copy link
Member

pitrou commented Jun 27, 2018

The problem is this will make the Pool constructor much slower (especially with the "spawn" method).
Without this PR:

>>> ctx = multiprocessing.get_context("spawn")
>>> %time ctx.Pool(1)
CPU times: user 574 µs, sys: 4.23 ms, total: 4.81 ms
Wall time: 4.34 ms
<multiprocessing.pool.Pool at 0x7fcc9450ed68>

@vstinner
Copy link
Member Author

The problem is this will make the Pool constructor much slower (especially with the "spawn" method).

I ran a benchmark on the master branch on Linux (Python compiled in release mode):

$ ./python -m perf timeit -s 'import multiprocessing; ctx = multiprocessing.get_context("spawn")' 'ctx.Pool(1)'

Result:

Mean +- std dev: [ref] 9.02 ms +- 0.50 ms -> [event] 43.0 ms +- 5.0 ms: 4.76x slower (+376%)

So yes, the creation of the Pool object becomes much slower. But if you use the Pool, the overall performance shouldn't change. It's just that currently, the worker startup time it delayed to the first usage of the pool.

Do you think that the slowdown is an issue?

Would you prefer to wait until the worker started before terminating it, to avoid the race condition?

Note: If you missed it, I tried a different approach but it changed the lifetime of handles in the parent process and so I abandonned it.
#7966 (comment)

@pitrou
Copy link
Member

pitrou commented Jun 27, 2018

Yes, I think the slowdown is definitely an issue.
I wonder how important this whole issue is. A worker being terminated before it's fully initialized is an extremely rare condition. I think we can live with a handle leaking in that case.

@vstinner
Copy link
Member Author

I wonder how important this whole issue is. A worker being terminated before it's fully initialized is an extremely rare condition. I think we can live with a handle leaking in that case.

It blocks my PR #7827 which may allow to catch bugs in other Python modules, like the one that I fixed in bz2 and lzma.

@vstinner vstinner closed this Sep 19, 2018
@vstinner vstinner deleted the mp_worker_event branch September 19, 2018 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants